An evaluation of classification models for question topic categorization

نویسندگان

  • Bo Qu
  • Gao Cong
  • Cuiping Li
  • Aixin Sun
  • Hong Chen
چکیده

We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset contains 3.9M questions and these questions are organized in more than one thousand categories in a hierarchy. To our best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the followings in classifying questions into CQA categories: 1) the usefulness of n-Gram features and bag-of-word features; 2) the performance of three standard classification algorithms (Naı̈ve Bayes, Maximum Entropy, and Support Vector Machines); 3) the performance of the state-of-the-art hierarchical classification algorithms; 4) the effect of training data size on performance; and 5) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse Structured Principal Component Analysis and Model Learning for Classification and Quality Detection of Rice Grains

In scientific and commercial fields associated with modern agriculture, the categorization of different rice types and determination of its quality is very important. Various image processing algorithms are applied in recent years to detect different agricultural products. The problem of rice classification and quality detection in this paper is presented based on model learning concepts includ...

متن کامل

A Novel Approach To Focus Identification In Question/Answering Systems

Modern Question/Answering systems rely on expected answer types for processing questions. The answer type is a semantic category provided by Named Entity recognizer or by semantic hierarchies. We argue in this paper that Q/A systems should take advantage of the topic information by exploiting several models of question and answer categorization. The matching of the question category with the an...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

An Empirical Comparison of Text Categorization Methods

In this paper we present a comprehensive comparison of the performance of a number of text categorization methods in two different data sets. In particular, we evaluate the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of the Vector and LSA models. We report the results obtained using the Mean Recipro...

متن کامل

Annotated Bibliography Content-Based Image Retrieval: Performance Evaluation and Semantic Scene Understanding

The general topic of my research, and thus also this annotated bibliography, is contentbased image retrieval (CBIR). Within CBIR, I picked out two areas that seem crucial to me: performance evaluation of CBIR systems and semantic scene understanding or scene categorization as a pre-step towards CBIR based on the automated annotation of scenes. In Section 2, some overviews of work done in CBIR a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 63  شماره 

صفحات  -

تاریخ انتشار 2012